-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type validation of interpolations #578
Conversation
fc8b08a
to
5ae4efa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know those two features are related, but this would have been easier to review as two pull requests:
- validating and converting on interpolation access
- converting dict and list to DictConfig and ListConfig when returned from custom resolvers.
docs/source/structured_config.rst
Outdated
omegaconf.errors.ValidationError: Value 'string' could not be converted to Integer | ||
full_key: int_key | ||
object_type=Interpolation | ||
>>> cfg.str_key = 1234 # convert int to str (assignment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion has nothing to do with interpolation and we are thinking about deprecating it anyway.
>>> cfg.str_key = 1234 # convert int to str (assignment) | |
>>> cfg.str_key = "1234" # string value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so that was related to how the original text also mentioned assignment... all fixed in a80ae95
I hear you, although to be clear, in that case what happened is that the code I was writing for (1) turned out to also address (2). It's not like one was written on top of the other and combined in a single commit. |
cd8655b
to
b556a82
Compare
@omry I think I have addressed all comments except #578 (comment) on which I'd like to have your input first. Also, something I just realized, is that if we have an cfg = OmegaConf.create({"x": "???"})
cfg.x = {1, 2, 3} I see 4 options:
My preference would be for (2) or (3) (option (4) seems really bad as it would break a bunch of stuff, just mentioning it for the sake of completeness) |
I am not addressing any of your other comments yet, but I wanted at a high level (and in response to one of your questions): This is what I think: Doing it properly would require a lot of thinking and I just thinking about it (before reading your comments) made me realize we might want to drop the feature of generating DictConfig/ListConfig from 2.1 and reconsider for 2.2. We need to wrap things up for 2.1 and this one seems like a pretty deep rabbit hole. I think we have two options:
Either way we can revisit for 2.2. |
To be clear (there's quite a lot in in the PR commentss to it's a bit hard to follow, sorry), the new feature of "generating dictconfig and listconfig" works fine and actually has validation (because What doesn't get validated is direct node interpolation, e.g., So my suggestion would be to keep it like this but open an issue regarding the validation of container nodes for direct node interpolation (which could be fixed after the 2.1 release). I didn't understand your comment regarding caching though: which caching mechanism are you talking about? |
When I say "generating", I am referring to custom resolvers returning dict or list.
As for validation when accessing another node via interpolation: Since the other node is already a valid DictConfig/ListConfig, I think we can get away with shallow validation (let me know if you disagree with that assertion). This breaks down roughly to:
Is the type of foo compatible with the type of bar?
If we decide to not validate we should explain it in the docs with a sentence and consider closing the gap in a later version. |
So am I :)
It will be regenerated on every access. Even if the resolver is cached, only the dict/list output is cached, the node creation (and associated validation) will still happen dynamically.
Regardless of caching, the modifications will be lost next time the node is accessed, because What should be the expected behavior is definitely a good question. I don't think it can make sense to allow modifications of such dynamically-generated configs, because the key we are overriding might not even exist anymore next time we access it. What if we made them read-only?
No, because the cache is only caching the generated dict/list. So the interpolations will still be resolved dynamically regardless of whether it's cached or not.
Not sure what you mean by "shallow validation".
Ok, and I would suggest to decide not to validate in 2.1 (because as you said, "this is pretty far down the rabbit hole"). |
Okay. This could be a perf hit but I think it's okay. hopefully people will not use it to return massive dicts.
Making then read-only seems like a good idea.
I mean we should not re-create the whole node, must validate the the type match at the top level. @dataclass
class Config:
user : User
admin: User = II("${user}")
cfg : Config = OmegaConf.structured(Config)
cfg.admin.name = "root" # this is perfectly legal and should mutate user. In the above case, we should not re-generate the node on access. we should return the user node. |
Just a note that I went with option (2) in 21685e3 (because it's the least disruptive). Open to considering other options though. |
Ok thanks for the clarification, yes I agree with you. @dataclass
class Config:
user : User = User()
admin: User = II("user")
cfg : Config = OmegaConf.structured(Config)
cfg.admin.name = "root" # this is perfectly legal and should mutate user.
assert cfg.user.name == "root" |
Done in e25937d |
I owe you a full pass to see all your changes, I didn't respond to most of your comments/questions yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the code, didn't get to the tests.
Please read all comments before fixing things (one of them is asking to split the PR).
@@ -472,6 +472,19 @@ simply use quotes to bypass character limitations in strings. | |||
'Hello, World' | |||
|
|||
|
|||
Custom resolvers can return lists or dictionaries, that are automatically converted into DictConfig and ListConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think that we need a dedicated page for interpolations (not as a part of this PR).
Interpoaltion docs are now about 300 lines.
We can keep the really minimal use cases in usage and mention more advanced use cases and point to the interpolations doc page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think that we need a dedicated page for interpolations (not as a part of this PR).
Added to #535 => resolving
Edit 2021-11-22: actually unresolving as otherwise linking to this discussion doesn't work
) | ||
# If the output is not a Node already (e.g., because it is the output of a | ||
# custom resolver), then we will need to wrap it within a Node. | ||
must_wrap = not isinstance(resolved, Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is now a kilometer long.
Refactor it to reduce the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d09277d
I had a comment I can't find now for some reason: # 1.py
from typing import *
x: Any = None # fine
y: Optional[Any] = None # also fine, but redundant.
Since AnyNode is modeling Any, I think it should not have an optional flag. This will eliminate some of the confiusion |
Agreed, I included this change in #592 |
This commit fixes several issues: * For nested resolvers (ex: `${f:${g:x}}`), intermediate resolver outputs (of `g` in this example) were wrapped in a ValueNode just to be unwrapped immediately, which was wasteful. This commit pushes the node wrapping to the very last step of the interpolation resolution. * There was no type checking to make sure that the result of an interpolation had a type consistent with the node's type (when specified). Now a check is made and the interpolation result may be converted into the desired type (see omry#488). * If a resolver interpolation returns a dict / list, it is now wrapped into a DictConfig / ListConfig, instead of a ValueNode. This makes it possible to generate configs from resolvers (see omry#540) Fixes omry#488 Fixes omry#540
This seems safer unless there is a clear need to instantiate it directly.
Now all exceptions are InterpolationValidationError (or a subclass), and `throw_on_resolution_failure=False` silences them all.
e25937d
to
d09277d
Compare
I rebased on top of master and force-pushed, now that #592 has been merged. I cleaned up a bit the commit history. I will reference the new meaningful commits in the corresponding comments. |
tests/test_interpolation.py
Outdated
# This example specifically test the case where intermediate resolver results | ||
# are not of the same type as the key. | ||
User(name="Bond", age=SI("${cast:int,${drop_last:${drop_last:7xx}}}")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be tested with identity instead? drop_last seems overly complex for what this test is doing.
User(name="Bond", age=SI("${cast:int,${identity:'7'}")),
Maybe even nest two identity calls (not sure what nesting twice is adding to the test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, this would have looked somewhat nicer if we had () syntax support. Maybe for 2.2 :).
${cast:int, ${identity:'7'}
vs
${cast(int, ${identity('7'))}
I would also say that for a single argument, :
is better.
So:
${cast(int, ${identity:'7')}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be tested with identity instead? drop_last seems overly complex for what this test is doing.
User(name="Bond", age=SI("${cast:int,${identity:'7'}")),Maybe even nest two identity calls (not sure what nesting twice is adding to the test).
I wanted to test explicitly the situation where there is an intermediate resolver output (by intermediate, I mean not the final result of the interpolation) that would be invalid if we tried to cast it to the node's type. The motivation was that my first implementation was doing the cast immediately after a resolver was called (even an intermediate one), which was incorrect (so I wanted a test to catch that).
If I just use identity
then the string remains "7"
which is actually valid when cast to the type of the node (integer). In that regard my comment motivating the test is not clear enough => I changed it in a99f776
I could also just remove that test if you feel like the extra complexity isn't worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a combination of casts can do it then :)
User(name="Bond", age=SI("${cast:int,${cast:str,${identity:7}}}")),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, this would have looked somewhat nicer if we had () syntax support. Maybe for 2.2 :).
${cast:int, ${identity:'7'} vs ${cast(int, ${identity('7'))}
I would also say that for a single argument,
:
is better.
So:${cast(int, ${identity:'7')}
I guess the version I like most is this:
${cast(int, identity('7'))}
As in, once we are in an interpolation we no longer need the ${}, we are already in function mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a combination of casts can do it then :)
User(name="Bond", age=SI("${cast:int,${cast:str,${identity:7}}}")),
If I'm not mistaken the resolver outputs here are all either 7
(the integer) or "7"
(the string) => both can be cast to int so that wouln't address the specific issue I wanted to pinpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess I don't fully understand the error you are defending against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an error with the approach from https://github.com/omry/omegaconf/pull/506/files
In this approach, the call to validate_and_convert()
happened immediately after a resolver was called, so in a chain of resolvers all outputs were validated (and possibly converted) according to the node's type. But that's wrong: it's only the final result that needs to be checked.
Congratulations 🎉. DeepCode analyzed your code in 2.394 seconds and we found no issues. Enjoy a moment of no bugs ☀️. 👉 View analysis in DeepCode’s Dashboard | Configure the bot |
OmegaConf.register_new_resolver("list", lambda: cfg) | ||
c = OmegaConf.create({"x": "${list:}", "y": -1}) | ||
assert isinstance(c.x, ListConfig) | ||
assert c.x == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add a quick check to confirm those are readonly (here and in dict).
assert c.x == expected | |
assert c.x == expected | |
assert x._get_node("x")._get_flag("readonly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 572e125
I ended up adding a utility function since it can't be a one-liner because of mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome.
Can you take a last look at the description of the PR and ensure it's still correct after all the changes? I will use it as is in the commit message. |
Yes it looks good. I just added the fact that resolver-generated configs are read-only. |
Validate and convert interpolation results to their intended type
This PR fixes several issues:
For nested resolvers (ex:
${f:${g:x}}
), intermediate resolveroutputs (of
g
in this example) were wrapped in a ValueNode just tobe unwrapped immediately, which was wasteful. This commit pushes the
node wrapping to the very last step of the interpolation resolution.
There was no type checking to make sure that the result of an
interpolation had a type consistent with the node's type (when
specified). Now a check is made and the interpolation result may be
converted into the desired type (see Overriding configuration using an environment doesn't trigger type checks when merging into Structured config #488).
If a resolver interpolation returns a dict / list, it is now wrapped
into a DictConfig / ListConfig, instead of a ValueNode. This makes it
possible to generate configs from resolvers (see Allow resolvers to output a dict / list to generate a DictConfig / ListConfig #540). These configs
are read-only since they are being re-generated on each access.
Fixes #488
Fixes #540